A Comparative Description of GtoP modules for Portuguese and Mirandese Using Finite State Transducers
نویسندگان
چکیده
Mirandese is the second official language in Portugal. For ages it was only preserved as an oral language and it was the object of a recent orthographic convention. This paper describes our efforts in porting our grapheme-to-phone module from European Portuguese to Mirandese. We describe the main differences between the two languages that can affect this module and the set of new SAMPA symbols that had to be defined for the phonetic transcription of Mirandese. We then briefly cover our rule formalism and the composition of the various transducers involved in the grapheme-to-phone conversion, and describe the results obtained for the two languages. The use of finite state transducers allowed a very flexible and modular framework for deriving and testing new rule sets. Our experience led us to believe that grapheme-to-phone modules could be helpful tools for researchers involved in the establishment of orthographic conventions for other oral transmission languages.
منابع مشابه
From Portuguese to Mirandese: Fast Porting of a Letter-to-Sound Module Using FSTs
This paper describes our efforts in porting our letter-tosound module from European Portuguese to Mirandese, the second official language in Portugal. We describe the rule formalism and the composition of the various transducers involved in the letter-to-sound conversion. We propose a set of extra SAMPA symbols to be used in the phonetic transcription of Mirandese, and we briefly cover the set ...
متن کاملAligning and recognizing spoken books in different varieties of Portuguese
This paper tries to present digital spoken books as a useful diagnostic tool for detecting alignment and recognition problems and for studying the porting of these technologies to different varieties of the same language Portuguese, in our case. We summarize the main differences between European and Brazilian Portuguese (EP/BP) and describe how they affect the GtoP system. Despite the small siz...
متن کاملCompound Temporal Adverbs in Portuguese and in Spanish
This paper reports on an ongoing research on temporal adverbs and deals with the problem of processing a family of Portuguese and Spanish compound temporal adverbs, in a contrastive approach, aiming at building finite state transducers to translate them from one language into the other. Because of the large number of combinations involved and their complexity, it is not easy to list them in ful...
متن کاملApplying Transducers to Spoken Language Processing for Portuguese
This paper has two different goals. The primary aim is to illustrate the advantages of weighted finite state transducers for spoken language processing, namely in terms of their capacity to efficiently integrate different types of knowledge sources. We have chosen three areas to emphasize several aspects of the application of transducers: large vocabulary continuous speech recognition, automati...
متن کاملRecovering capitalization and punctuation marks for automatic speech recognition: Case study for Portuguese broadcast news
The following material presents a study about recovering punctuation marks, and capitalization information from European Portuguese broadcast news speech transcriptions. Different approaches were tested for capitalization, both generative and discriminative, using: finite state transducers automatically built from language models; and maximum entropy models. Several resources were used, includi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003